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ABSTRACT 



We investigate potential biases in the measurements of exoplanet orbital parameters obtained 
from radial velocity observations for single-planet systems. We create a mock catalog of radial 
velocity data, choosing input planet masses, periods, and observing patterns from actual radial 
velocity surveys and varying input eccentricities. We apply Markov Chain Monte Carlo (MCMC) 
simulations and compare the resulting orbital parameters to the input values. We find that a 
combination of the effective signal-to-noise ratio of the data, the maximal gap in phase coverage, 
and the total number of periods covered by observations is a good predictor of the quality of 
derived orbit parameters. As eccentricity is positive definite, we find that eccentricities of planets 
on nearly circular orbits are preferentially overestimated, w ith typical bias of 1 — 2 times the 
median eccentricity uncertainty in a survey (e.g., 0.04 in the Butler et al. 20061 catalog). When 



performing population analysis, we recommend using the mode o f the margin a lized posterior 
eccentricity distribution to minimize potential biases. While the iButler et all (|2006l ) catalog 
reports eccentricities below 0.05 for just 17% of single-planet systems, we estimate that the true 
fraction of e < 0.05 orbits is a bout .fn.n5 = 38 ± 9 %. For planets with P > 10 days, we find 
Jo. 05 = 28 ±8% versus 10% from lButler et all (|2006r ). These planets either never acquired a large 



eccentricity or were circularized following any significant eccentricity excitation. 
Subject headings: methods: statistical - planetary systems - techniques: radial velocities 

1. Introduction 



One of the most surprising properties of the ~300 extrasolar planets found by radial velocity sur- 
veys is that their orbital eccentricities are much higher than those of the planets in the Solar System 
(Figure Q}. Indeed, after excluding planets with short orbit periods (P < 4 days) that have likely been 
influenced by tidal circularization, only ^17% of these planets have eccentricities < O.OfOJ Several groups 
have proposed mechanisms able to excite ex trasolar planet eccentricities to the levels foun d in radial ve- 
locity surveys (ITremaine h Zakamskal 120041) . Some examples are planet scattering (e.g., Rasio fc Ford 



1996; IWeidenschilling fc Marzarilll996l ). perturbati ons from a wide binary companion (e.g., Holma n et al 



Mal mberg et al.l l2007) 



1997), and perturbat ions from passing stars (e.g., Laughlin fc Adanisl 19981: Zakamska fc Tremaine 2004 : 



Subsequent studies of planets in binary systems ( Marzari et al.ll2005 


Takeda & Rasiol2005; Fabrvckv & Tremaine! 


2007), close stellar encounters ( 


Malmberg & Davies 


2009) and planet-planet scattering (Ford et al. 


2001: 


Marzari & Weidenschillingl2002: 


Adams & Laughlin 20031 Chatteriee et al. 2008: 


Ford & Rasiol2008Ujuric & Tremaine! 


20081) have made detailed predictions for the distributions of extrasolar planet orbit properties. These studies 



typically use the distribution of the published best-fit orbit eccentricities as a proxy for the true underlying 
eccentricity distribution. They are therefore vulnerable to any significant differences between the measured 
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and intrinsic distribution s. To mitigate the e ffects of uncertainties in the distribution — and in deviation 



from standard practice — Ford fc Rasio ( 20081) performed Bayesian analyses of single planet systems, so that 



they could account for the uncertainty in the measurement of orbit parameters when comparing the predicted 
distribution to observations. While this approach has the advantage of emphasizing well-measured planets 
and de-emphasizing poorly measured ones, it still may be affected by biases. For example, since eccentricity 
is positive definite, an eccentricity measurement for a planet on a circular orbit can only overestimate the 
eccentricity. Current observational estimates of the eccentricity, or of the posteri or distributions for the 



eccentricity, are likely biased towards larg er eccentricities for nearly circular planets (|Lucv fc Sweenevlll971 
Shen & Turnerll2008t lO'Toole et alJboO 



There are several reasons to be interested in accurate measurements of extrasolar planet eccentricities, 
especially at small eccentricities where measurement biases are most significant. For example, precise deter- 
mination of eccentricity distribution is important for short-period planets which can be circularized by tidal 
interactions with the star. By examining which of the planets with short tidal times have fully circularized 
and which retain a significant eccentricit y, one can calibrate models of tidal dissipation and identify recent or 
ongoing eccentricity excitation episodes ( Ford fc Rasiol 20061 Matsumura et al. 20081 Nagasawa et al. 20081: 



lty 
)9) 



Batvgin et al.l 120091) . For planets beyond the reach of tidal circularization, an accurate eccentricity distri- 



bution, and in particular the fraction of low-eccentricity planets, may provide a probe of planet formation 
processes. For example, current data su ggest that planet-planet scattering models pred ict fewer planets on 
nearly circular orbits than are observed (|Juric fc Tremaindl2008l : IChatteriee et al]|2008l ). 



In this paper we quantify the biases of extrasolar planet eccentricities when measured from radial velocity 
observations. We create a mock catalog of radial velocity data, choosing planet masses, orbit periods and 
observing patterns to mimic those of actual radial velocity surveys (3D). Using a Bayesian framework and 
Markov chain Monte Carlo (MCMC) simulations, we calculate the posterior probability distribution for each 
mock data set, along with several summary statistics for each data set and each orbit parameter (Sj3|). We 
examine several different eccentricity estimators and make recommendations for minimizing the effects of 
bias. More generally, we investigate the quality of orbit parameter determination in modern radial velocity 
surveys and investigate the reliability of determining orbit parameters as a function of the quality of data 
sets in 21 Based on these results, we estimate the underlying distribution for the eccentricities of extrasolar 
planets in $5) We summarize our results in Ej6j 



2. Experimental design 

Our investigation is simple in concept: we input a set of realistic orbit parameters for an extrasolar 
planet, generate a radial velocity curve for its host star, compute the orbit parameters from the radial 
velocity curve in a blind experiment and compare output with input. However, the many choices necessary 
to produce a realistic radial velocity curve make setting up such an experiment quite a challenge in itself, 
especially if we aim to provide a prescription for correcting the biases in real surveys. For example, generating 
radial velocity curves from scratch requires not only assumed intrinsic distributions of planet and host star 
properties but also a way to account for the varying sensitivities, observing strategies and target selection 
methods of different surveys. We sidestep most of these challenges by taking our input orbit parameters, 
observational errors and time sampling directly from the real systems in the catalog of Butler et al. (2006; 
hereafter B06). These authors present orbital solutions for 172 exoplanets within 200 pc of the Sun and 
provide public radial velocity data sets for those from the California Carnegie Planet Search. It is the largest 
refereed catalog of exoplanets discovered via radial velocity observations and it contains most of the systems 
for which radial velocity data are publicly available. Our approach is then to investigate the possible biases 
in determination of orbit parameters in this particular catalog. 

To create mock planet systems, we take the best-fit period P, velocity amplitude K, argument of 
periastron lo and time of passage of the periastron t p of the dominant planet in each of the 91 systems in the 
B06 catalog with fewer than 90 observations (the others take prohibitively long to analyze in large numbers) , 
and use times of observations and observational errors from the corresponding radial velocity curves. Among 
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these 91 systems, the median data set contains 40 data points, covers 8 planet periods and has a radial 
velocity uncertainty of 3.3 m/s. For each system we generate synthetic radial velocity data sets v (t) for five 
input eccentricities e = 0, 0.05, 0.1, 0.3, 0.6 using the Keplerian model |Murrav fc Dermott 19991) 



v(t) = K [cos(w + T(t)) + e sin(w)] , (1) 

where T is the true anomaly. The argument of periastron is measured from the line in the orbit plane where 
it intersects the sky plane and the planet is approaching the observer. At each actual observation time £j, 
we generate a simulated velocity (v%) as a random variable normally distributed about a "true" value of 
v(ti) + C with dispersion \J o^ a i + ct 2 ,. Here C is the constant systemic velocity of the system, cr b s ,i are 
measurement uncertainties taken directly from the real data sets and <jj = 3.5 m /s is a fixed v elocity "jitter" 
adopted to account for astrophysical noise such as stellar photospheric activity ( Wright 20051) . 



Each resulting mock radial velocity data set consists of the times of observation ti, mock radial velocity 
measurements v%, and errors of measurement <7 bs,i- For each input eccentricity, we create five realizations 
of the radial velocity data with a different set of Gaussian random variables for each realization, using 



°obs i + a J as t ne width of the Gaussian distribution. We then perform Bayesian analyses of the resulting 
2275 (= 91 systems x 5 eccentricities x 5 realizations) mock radial velocity data sets. 



3. Analysis of mock and observational data 



Th e Bayesian ana ly sis of radial veloc i ty ob s ervations is descr i bed b y lFordl |2005h : iGregorvl (jiooih : iFordl 



(|2006l ); iGreeorvi (|2007f ) ; iFord fe Gregory! ([20071 ); iBalan fc Lahavl ([20091 ) ■ among others. In this Section, we 
review the essential elements and point out the details that differ from those in previous work. 



3.1. Priors and likelihood 



To ensure an efficient performance of the global search algorithm ( §3.2j) . we adopt simple, analytic, 
separable priors that provide a reasonable first approximation to the exoplanet distribution discovered by 
radial velocity planet searches. They take the form 



p a ii(P, K, e,w, M ,C, aj) = p P (P)p K (K)p e (e)p UJ (uj)pM(M )pc(C)p r7 , (aj), 



(2) 



where Mo is chosen to be the mean anomaly at the middle of the time series. The prior for the orbit period 
is uniform in logarithm of the period: pp{P) = P -1 / ln(P ma x/Pmin)- For the global search ( WS.2\i we impose 
hard limits, P m i n — 2 days and P ma x = ""(/max — t m - m ), where i m ax~imin is the time interval between the first 
and the last observations. These limits are chosen to be far away from the solution for any planet clearly 

The priors for the radial velocity amplitude and jitter are 
120051 ) of the form p x (x; x ,x max ) = (1 + x/xq)' 1 / ln(l + x nlllx /x ), where 

- 1/3 /VT 



detected from the present data (Ford |2006) 



modified Jeffreys priors (|Gregorvl ll 
x is either K or <jj. The hard upper limit for the amplitude, K n 



(P/Pnin)" 



e 2 x 1690 m/s, 



corresponds to the radial velocity amplitude produced by a ~ 10Mj up planet orbiting a solar mass star in 
the plane containing the line of sight. The upper limit for the jitter, cx/ iinax = 1690 m/s, is based on the 
same amplitude for a circular planet with period P m in- The scale parameters (Kq and crj,o) are set to 0.1 
m/s so as to prevent a peak in the posterior at small amplitudes due to a divergence at zero in a Jeffreys 
prior. The prior for the eccentricity is uniform between zero and unity, and we further discuss the effects 
of this assumption in i j4.21 The priors for the argument of the periastron and for the mean anomaly are 
uniform between and 2tt. The prior for the velocity offset is uniform and not bounded. 

We assume that each radial velocity observation is independent and normally distributed about the true 

a j. Therefore, the likelihood is given by 



value with a variance cr 2 hs i 



L(d\0) = (2tt) 



-N/2 



' N 

n 

,i=l 



-1/2 



exp(- X 2 (0)/2), 



(3) 
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and 

£= 1 obs,z J 

The posterior probability distribution is given by Bayes' theorem, 

p(0\d)=p(0)L(d\0)/p(d), (5) 

where p(d) = J p(0)L(d\0) dO is a normalizing constant that need not be computed for our purposes of 
parameter estimation within a single model. 

We perform our Bayesian analysis in two steps. First, we perform a global search to identify the 
dominant mode(s) of the posterior distribution ( fl3.2|) . The output of this global search step is used to 
generate the initial states of the Markov chains for a MCMC analysis ( tj3.3j) . 



3.2. Global Search 

In the global search we perform a brute force integration over the variables {P, e, Mo} using two quasi- 
random number generators (QRNGs), perform a one-dimensional numerical integration over aj, and expand 
the arguments of the expon ents using Taylor series in the remaining integrals (the Laplace approximation; 



Cumming 2004; Ford 2008). For the outer loop, the first QRNG uses a one-dimensional Sobol sequence to 
generate the orbit frequency 1/P for periods between P min = 2 days and P max = 7r(£ max — t m j n ). We evaluat e 
the (unnormalized) posterior (i.e., prior times likelihood) at each of N per periods. Following iGregorvl (|2007| ). 



7V pcr = max|l0 5 ,min[l0 4 ,2ceil[P max /P min -l]ceil[l + 1.6x (S/N) x (i max - t min ) / (P max VN)]}} , (6) 

r n i x i 2 

where S/N = N^ 1 ' 2 Yli=ii v i ~ ^t>f) 2 /Cobsi * s an estimate of the combined signal to noise and Cm is 
the best-fit constant velocity for the given data set. 

At each sampled orbit period, the second QRNG uses a two-dimensional Sobol sequence to generate the 
eccentricity e and mean anomaly at epoch Ma uniformly over their full range of eccentricities and periastron 
angles. This Sobol sequence contains 256 to 4,096 pairs of e and Mq. The number of samples is chosen so 
that the accuracy is better than 10% (20%) depending on whether the period being considered contributes 
more than (less than) one part in 10 s of the current estimate for the posterior probability marginalized 
over all periods. The accuracy of the (unnormalized) marginalized posterior probability for a given period 
is estimated based on the first and second half of the Sobol sequence after 256, 512, 1024, 2048, and 4096 
samples. 

For each set (P, e, Mo), we integrate numerically over oj using the adaptive Gauss-Kronrod integration 
method implemented in GNU Science Library. For each set (P, e, Mo, crj), we estimate the (unnormalized) 
posterior probability marginalized over all the remaining "linear" model parameters K, cj, C using the 
Laplace method. By changing variables from K and u) to Kcoslo and Ksimo, we can write the radial 
velocity model as a linear function of the remaining parameters. For fixed values of (P, e, Mp,gj), there is 



then a single global minimum of \ 2 which can be efficiently solved for via matrix algebra ( Wright fe Howard! 



2009). We use singular value decomposition to solve for the best- fit values of the linear model parameters. We 
then approximate the (unnormalized) marginalized posterior probability based on the value of the prior, the 
likelihood, the determinant of the inverse covariance matrix, and a Jacobian all evaluated at the best- fit value. 
The Jacobian, J = 1/ \K\, is needed since the Laplace approximation depends on the model parametrization. 
Our model is linear in the variables K cos uj and K sincj, but we use priors that are uniform in K and uj. 

After integrating over all model parameters, we identify the dominant mode of the posterior probability 
distribution and draw samples of the model parameters from this mode for use as initial states in the Markov 
chain Monte Carlo analysis described in ^3-31 The above global search algorithm is parallelized using OpenMP 
and is applied separately to each data set considered. In principle, the above global search algorithm is fully 
automated and does not require human intervention to provide knowledge about the location of the best-fit 
model. In practice, some data sets failed initially and were rerun with iVp er larger by a factor of 4. 
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3.3. MCMC analysis 



For each data set, we perform a MCMC analysis closely following the methods of Ford (2006). All 



our Markov chains take steps in the variables 1/P, K, e, esin(a; + Mo), ecos(w + Mo), w + Mo, and lncj. 
The velocity offset is explored using Gibbs sampling. However, when two model parameters are strongly 
correlated, steps in the two variables individually can be very ine fficien t . Mu ch larger steps are possible 
if they are taken in sets of variables that have weaker correlation. Fordl ( 2006 ) identified expanded sets of 



stepping variables that greatly accelerate the convergence of Markov chains for a variety of types of systems 
and radial velocity data sets. We use this expanded set of stepping variables to help our Markov chains 
converge more rapidly. Therefore, for data sets where the best-fit solution (identified from the above global 
search) has a period of greater than 10 days, we also allow our Markov chains to take steps in the variables 
w, ifcosu;, JsTsinw, lo + To, t p , and YaKy/\ — e, where To is the true anomaly at epoch. 

Before calculating the Markov chains to be used for inference, we first calculate chains with variable 
step scale factors so as to determine step scales that result in efficient exploration of parameter space. In the 
vast majority of cases, our standard initial guesses for the step sizes resulted in convergence on step scale 
factors that gave acceptance rates of ~ 40%; in about 5% of cases we had to manually adjust the initial 
guess for the step size for the algorithm to identify usable step scale factors. We then discard these initial 
chains and produce Markov chains with fixed step scale factors for parameter estimation. 

For each data set, we compute five Markov chains consisting of at least 10 6 states each which we use to 
test for any signs of non-convergence as described below. Once a set of Markov chains is accepted, we draw 
a random sample of 10 4 states from the final 80% of all five chains. This sample is used for inference such 
as calculating summary statistics as described in jj4.il 

To flag non-convergence we calculate the Gelman- Rubin test statistic and estimate the correlation l ength 
for ea ch for the variables listed in £13.11 (regardless of the orbit period), as well as Mo, as described in iFordl 



(2006). As a practical matter, our Markov chain calculations use a prior uniform in InP without any bounds. 
Therefore, we require that the chain converge upon a plausible range of orbit periods with significant weight 
between P m i n and P max . In several cases, much longer chains were needed in order to pass the above 
convergence tests. 

Of the 2275 mock data sets, we exclude 17 which failed convergence tests despite attempts to modify 
step size and direct the chain toward a different global search solution. We exclude an additional 31 chains 
which nominally converged but whose final periods arc too poorly determined (a[P]/P > 0.5, where cr[P] is 
defined in detail in the next section). We further exclude another 31 chains whose output period deviates 
too much from the input period, that is, those with log 10 |-Pout/-Pin| > 0-3. Reasons for the poor performance 
of most of the 79 chains thus rejected (3.5% of the total number investigated) are easily identified (e.g. too 
few data points, low signal from the planet, or an orbit period commensurate with yearly gaps in observing 
coverage). 



4. Determination of orbit parameters 

To interpret our MCMC results, we calculate from each multi-dimensional posterior sample a small set of 
summary statistics for each orbit parameter. To do this, we marginalize each chain over all orbit parameters 
except one to estimate the one-dimensional posterior probability distribution for this one parameter. We 
then find summary statistics to describe the one-dimensional distributions. The summary statistics typically 
reported in planet discovery papers are the best-fit value of a given orbit parameter and an uncertainty often 
interpreted as a 1-sigma confidence interval with the model of a Gaussian distribution in mind. However, for 
an arbitrary one-dimensional marginalized posterior distribution, care must be taken to choose parameters 
that accurately describe the sample. In ^4. II and ^4.21 we consider several options. 
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4.1. Output quality for one-dimensional distributions 

As candidates for the best-fit value of an orbit parameter we compare the median, mean, and mode 
of the one-dimensional marginalized posterior distribution. We estimate the mode — the location of the 
maximum probability density — by locating the densest clump of values in the 10 -state subsample from 
the output Markov chain. We proceed by locating the smallest contiguous interval containing 9999 of the 
10000 sample values, then that containing 9998, then 9997, and so on. When the smallest contiguous interval 
containing k — 1 sample values shifts away from the position of the smallest interval containing k samples 
by more than 0.2k samples, we assume that we have reached the scale of density irregularities present due 
to finite sampling of the probability distribution. We therefore take the interval containing k values as 
representative of the position of the densest clump of samples and take the median of the k values — the 
center of the clump — as an estimate of the modeQ 

We define a 68% credible interval as the smallest contiguous interval containing 68% of all points in the 
chain. For a Gaussian distribution, this percentage corresponds to the fraction of the distribution within 
0.994458 standard deviations of the mean; by analogy, we define the measurement precision as 1/(2-0.994458) 
times the credible interval. Figure [5] shows example marginalized posterior distributions for periods and 
velocity amplitudes. The comparison between our observed distributions and a Gaussian function can be 
parametrized by comparing the 68% credible intervals to analogously defined 95% and 99% ones. The median 
relationships between these values suggest that most of our posterior distributions for period and velocity 
are close to Gaussian, with ergg only about 6% above am- Therefore, the non-Gaussianity of our posterior 
distributions is not nearly as severe as that found by O 'Toole et al. (2008), who used x 2 minimization to 
obtain orbit parameters. These authors find a factor of 5 to 10 difference between values of a obtained from 
68% and 99% credible intervals. Such deviations from Gaussianity are seen in only 9% of our systems. None 
of our conclusions are affected by the difference between ergs and crgg, so hereafter we use "precision" 
to mean a defined on the basis of the 68% credible interval. We indicate the relevant orbit parameter in 
brackets, e.g., o[K] is the precision of the velocity amplitude measurement. 



4.2. Comparison of summary statistics for eccentricity 

The left panel of Figure [3] shows the marginalized posterior distribution for eccentricity for a system 
with input eccentricity (the same as shown in Figure [2|) . Clearly, the one-dimensional output eccentricity 
distribution is highly asymmetric for nearly circular orbits because eccentricity is positive definite: the best- 
fit output eccentricities are always positive, and so are the minimal eccentricities allowed by the credible 
intervals defined above. Of the three measures discussed in the previous section (the mode, the median and 
the mean), the mode of the eccentricity distribution is the closest to the true input value. 

The middle panel demonstrates the behavior of the Markov chain marginalized over all parameters 
except e and oj. The chain is displayed in the plane of the "two-dimensional eccentricity" components 



e sin u and k = e cos oj ( Murray &: Dermott 



19991 ). In these variables a random- walk Markov chain 



can easily explore the parameter space near the input value e = 0, so we investigate eccentricity measures 
written in terms of h, k. A constant probability density in the e, oj space (corresponding to the flat priors we 
adopted for these variables) diverges as 1/Vh 2 + k 2 when the variables are changed to h and k. This leads 
to a divergence oc | ln(/i, k)\ at h, k = in the marginalized ID posterior distributions for h, k. These spikes 
render the modes of the h, k distributions unusable as summary statistics. While the probability density 
for each variable diverges, its integral converges, so the mean and the median of the h, k distributions 
are less sensitive to this problem; the corresponding eccentricity measures are e mo( j = y /ij 2 noc i + fc mo d aid 



— x/^moan + ^ 



mean " 



In an effort to compensate for the h-k singularity's effects on summary statistics, we weight each point 
of the chain by its e value (the Jacobian between e, u) and h, k coordinates); values calculated with the 



2 The IDL code with our mode estimator is provided as an online supplement to this paper. 
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use of weights are denoted with a 'w'. This weighting gives an estimate of the posterior distribution for 
the same data using priors fiat in h, k. We calculate the weighted median and weighted mean of the 
posterior distributions of h and k, as well as the values e w _ roed = \ ^w-med + ^w-med an< ^ ^w-mean = 



h 2 A- k 2 

w — mean r w — mean' 



To calculate precision measures for two-dimensional eccentricities, we draw a line in the h, k plane 
through the points (0, 0) and (/i me d 7 fcmod) and project each sample from the Markov chain onto this line. 
The coordinates of the projected Markov chain points are e = (h ■ /i mc( j + k ■ /c me d) / e m ed and can be positive or 
negative. Using the one-dimensional distribution approach described in the previous section, we can compute 
credible intervals for e. The right panel of Figure [3] shows an example e distribution and the corresponding 
credible intervals, with the advantage being that they correctly include e = when it lies within the 68% 
two-dimensional confidence region. Since <r[e] and other precision measures based on the ID confidence 
intervals for h, k, h w , k w differ by only a few percent, we adopt a[e] for all future use. 

In Figure |4] we compare the statistical properties of the summary statistics for eccentricity discussed 
above. All of our summary statistics e moan , e mcd , e mo do, e mcan , e mod , e w _ moan , and e w _ mod are positive 
definite, so there is always a positive bias when e ln = 0. However, the strength of the bias varies with the 
definition used: e mo d c and e me d have nearly Gaussian distributions (or half-Gaussian for e m — 0) around 
e; n , while e mea n and e me d for the one-dimensional eccentricity (red and black histograms in the top row) are 
typically 1 — 2 times a[e] larger than ei n for nearly circular orbits. The magnitude of the bias is determined 
by the typical eccentricity uncertainty of the survey, which is 0.04 (median value of <r[e]) for the B06 catalog. 
Therefore, as seen in Figure 0] the bias is strong for e m < 0.05 and negligible for e m > 0.1. 



4.3. Relationship between input quality and output quality 

In this section we aim to develop a set of diagnostic criteria which allow us to evaluate the quality of 
any radial velocity data set for the purpose of orbit parameter determination. We consider period, velocity 
amplitude, and eccentricity and we gauge the quality of these extracted values with the following metrics: 

• cr[P], <r[if], c[e] are the period, velocity amplitude, and eccentricity precisions defined based on the 
relevant 68% credible intervals as explained in §4.11 

• P — Pi n , K — K SD , e — ej n , where "in" denotes the input value, give the "bias". Output K and P are 
taken to be the medians of the respective marginalized posterior distributions, whereas e mo do is taken 
to represent output eccentricity. 

• (P - P in )/P, (K - K in )/K give the "normalized bias" for P, K. 

We also considered "reliability" , the standard deviation of the five output values of each of P, K, e extracted 
from the five different realizations generated using the same input system, as an indicator of sensitivity to 
noise. We find that, statistically, reliability is indistinguishable from precision — confirming that <j[K], <r[P], 
a[e] are indeed accurate measures of orbit parameter uncertainty — so we do not discuss reliability further. 

Similarly, for each data set we use several input quality metrics: 

• N is the number of points in the data set. 

• iVp er is the number of periods covered by observations, (t max — £ m in)/P- 

• If Vi, cr bs,i are the observed radial velocities and their observational uncertainties, then the effective 
error-weighted precision of the data set is cr bs = (l/ CT obs i) 1 ^ 2 - 

• The effective signal-to-noise ratio is Ky/N /cr b s - 

• $max is the maximum gap in phase coverage in the phase-folded data set (we define the phase to run 
from to I). 
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Not all of the data set quality parameters are independent. We evaluate a rank correlation coefficient for 
every pair of these parameters (10 pairs), as well as the corresponding probability that the two parameters 
in the pair are uncorrelated. Two correlations are present at the 99.9% signhcance level: N pcT is positively 
correlated with the effective signal-to- noise and N is negatively correlated with $ max . With these caveats 
in mind, we look for correlations between the data set parameters and the output metrics for all orbit 
parameters. 

The strongest correlations relate the eccentricity precision a[e] and the effective signal-to-noise; the 
normalized period precision a[P]/P and the number of periods covered, N peI ; and the velocity amplitude 
precision <r[K] and the maximum phase gap $ m ax (Figure[5]). The corresponding Spearman's rank correlation 
coefficients are respectively —0.844, —0.881, 0.537. To the extent we can generalize from our simulated data 
to other radial velocity data sets, these correlations suggest rough guidelines for the output precisions one 
can expect from a data set with given values of effective signal-to-noise, $ max , and AT per as follows. 

• An eccentricity precision better than 0.05 requires an effective signal-to-noise greater than ~40. 

• Observations over one complete orbital period typically result in a normalized period precision of 10%, 
and a normalized period precision of 1% requires a time baseline of 2 — 3 periods. 

• Assuming radial velocity measurement uncertainties typical of the B06 catalog (3.7±1.8 m/s), a velocity 
amplitude precision of 3 m/s requires a maximum phase gap of less than about 0.3 orbit period for 
low- to moderate-eccentricity systems corresponding to our e- ln — 0, 0.05, 0.1 and a maximum phase 
gap of less than about 0.15 orbit period for moderate- to high-eccentricity systems corresponding to 
our ei n = 0.3, 0.6. 

These guidelines correspond to median relations between the input and output quality metrics. Upper 
envelopes of these relations can be parametrized as 

logcr[e] = 0.48 - 0.89 x \og{K %/iV/cj obs ); (7) 
\og{a[P]/P) = -1.23 - 1.00 x log(iV per ); (8) 
log(a[X],m/s) = 1.36 + 0.89 x log($ max ). (9) 

These relations are obtained by fitting power laws to the correlations in Figure [5] and then adjusting the 
normalization so that 90% of all points are below relations ([7])-(j9|). 

Notably, we see no correlation between our output quality metrics and the nu mber of observations N 



(Figure [5j3,f). This seems counter- intuitive and appears to contradict the findings of IShen fc Turnerl (|2008). 
The explanation for this apparent discrepancy is that our data involve an ensemble of systems rather than 
a single system viewed in a variety of observing situations. For an ensemble, the relation between output 
quality and N is complicated because new planet systems are generally published when they have passed 
some minimum reliability threshold regardless of how many observations were made to attain that threshold. 

To illustrate the effects of different N in observing a single system, we take two mock radial velocity 
data sets, one with e m = and one with e- m = 0.6, for a high signal-to-noise system with a large number 
of observations (70 Vir) and decrease N by deleting some observations at random. We then re-analyze the 
reduced data sets using our method and calculate orbit parameters and their precisions based on the MCMC 
output. The results are shown in Figure |5] In this setup the precision of orbit parameters indeed improves 
as the number of observations N increases: a decreases roughly as the expected N~ x / 2 for e- m = but for 
e- m = 0.6 shows a scaling between TV -15 and iV -10 . A possible explanation for this difference is that the 
improved phase coverage associated with larger TV improves orbit parameter determination more for a high 



eccen tricity system, where the cadence of observations near periastron is especially important ([Endl et al 



20061 ). than for a low-eccentricity one. 
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5. Correcting for eccentricity bias in radial velocity surveys 

5.1. Comparison with published orbit parameters 

In this section we apply our different eccentricity measures to real radial velocity data sets. Of the 91 
B06 systems used to generate our mock radial velocity curves, we now consider the subsample of 65 systems 
which are well-fit by a single planet — that is, those for which the root mean square residuals about the best- 
fit orbit solution for the system's largest planet are less than 15 m/s. Of these 65 systems, our Markov chains 
failed to converge for 14 Her. Since more rec ent work on 14 Her shows two planets with velocity amplitude 



ratio K c /Kj, ~ 0.3 (jWittenmver et alJ|2007D , we eliminated it from our subsample. Of the remaining 64 
systems, HD 190360 also contains multiple planets. However, in this case our single-planet method finds 
the solution fo r the biggest pla net, presumably because the second planet's relative contribution is smaller 



(K c /K b ~ 0.2. IVogt et alJl2005h . 



For these 64 systems we compare our eccentricity estimators with published solutions, which are typically 
determined by fitting a Keplerian orbit to the radial velocity observations and minimizing x 2 in the relevant 
7-parameter space. There are some minor differences between our analysis and that used for the B06 and 
other published solutions. In treating jitter, published solutions usually use observed stellar properties to 
fix a j while we derive aj from our MCMC output. Furthermore, the space in which our MCMC priors 
are flat — namely, [InP, ln(l + K/K ), ui, e, Mo, C, ln(l + aj/aj t o)] — differs in its paramctrization 
from the space over which published solutions typically minimize \ 2 • Nevertheless, we find that our e me an, 
the mean of the marginalized posterior distribution for eccentricity, is statistically very similar to the B06 
published eccentricities (see Figure[7]). The median difference between e mcan and the B06 catalog eccentricity 
is < 0.01er[e]; this remains true when only planets with e < 0.1 are considered. We find that using e mo de 
results in eccentricity estimates systematically smaller than the published ones. The median difference 
between e moc je and the B06 values is — 0.25er[e]; when we consider only e mo dc < 0.1 planets, it is — 0.6cr[e]. 

Our analysis in £|4.3l suggests that the bias in eccentricity should decrease with increasing quality (signal- 
to-noise) and number of observations. Using http://exoplanets.org, we selected all 34 planets in single- 
planet systems for which at least six years elapsed between the discovery announcement and the most recent 
published orbital solution. The comparison of eccentricity measurements and their uncertainties between the 
'old' (discovery) orbital solutions and the most recent 'new' ones is presented in Figure [5J The uncertainties 
in eccentricity decreased as more and/or better observations were collected. Since eccentricity is a positively 
biased measure, as uncertainties decreased the values of eccentricity decreased as well. 



5.2. Effects of choice of eccentricity estimator 

In this section we examine the effect of the eccentricity estimators on the determination of the observed 
fraction of planets f eo with eccentricities < eo among the subsample of 64 B06 systems with good one-planet 
fits. We further exclude HD89307 for which Markov chains corresponding to the mock data sets did not 
converge due to poor sampling and few observations, yielding 63 systems. In cases where the a[e) and hence 
the bias are comparable to the threshold eo, we expect some nearly circular orbits to be misidentified as 
significantly eccentric. Where cr[e] <C eo, the effect of bias should be negligible and the choice of eccentricity 
estimator is less critical. We report f eo for multiple eccentricity estimators in Tabic [T] 

Although e mo de and e me d are less biased than e moan (Figure SJ , all three estimators must be biased for 
sufficiently circular orbits as they are positive definite. As an illustration, we consider the subsample of the 
63 x 10 = 630 simulations generated using the data for the 63 systems with input eccentricities e; n = and 
e- m = 0.05. For the 315 simulations with e ln — 0, we measure /0.05 = 53% using e mcan and /0.05 = 80% using 
emode, suggesting that this measure would still misidentify as eccentric a fifth of circular exoplanet orbits. 
For the simulations with e; n = 0.05 we measure /0.05 — 31% using 6 mean and /0.0s = 52% using e mo dc- Using 
e m ode as an eccentricity estimator essentially eliminates the bias for e; n > 0.05, but some bias remains for 
smaller eccentricities. 
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0.10 
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0.10 
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0.23 


0.39 




^mcd 


0.08 


0.25 


0.39 



Table 1: Summary of estimates of fractions of low-eccentricity planets. 



5.3. The fraction of planets on nearly circular orbits 

In this section we estimate the true underlying eccentricity distribution, particularly near e = (Table 
2). We focus on the 63 B06 systems which are well- fit by a single planet and for which mock data sets 
have convergent Markov chains. Let p(e, e; n )de be the probability that we observe a B06 system to have 
eccentricity between e and e + de if all the systems have true eccentricity ei n . If all measurements were 
perfect, p(e, e- m ) — 5(e — e; n ), but in practice this function is determined by the combination of eccentricity 
precisions in the B06 catalog. Our simulations yield p(e, e- m ) for e; n = 0,0.05,0.1,0.3,0.6. If D m (ei n ) is the 
true eccentricity distribution in our 63-system sub-sample, then the observed distribution is 

£>obs(e) = / An(ein)p(e, ein)dei n . (10) 



J 

Integrating equation (|T0| over e from to e gives the observed cumulative distribution 

Cobs(e) = / An(ein)-P(e,ei n )de in , (11) 
Jo 

where P(e, e- m ) is the cumulative distribution corresponding to p(e, ei n ). In Figure [3] (left), we show P[e, e- ln ) 
for the five input values of eccentricity measured in our simulations. 

As a first step we assume an underlying eccentricity distribution of the form D cst = AQS(e— 0)+Ag.o5<$(e— 
0.05) + A ,i5(e — 0.1) + A , 3 S(e — 0.3) + A , 6 S(e — 0.6) where the constants Ai sum to unity. Estimating the 
underlying eccentricity distribution then amounts to finding coefficients Ai such as C est (e) = J2i AiP(e, e- ln ^) 
best represents the observed cumulative distribution C b s (e)- We find the best-fit A; by minimizing the 
Kolmogorov-Smirnov (KS) statistic between C b s (e) and C cs t(e) (Figure [SJ right). Using e mo d c as the 
estimator, we obtain A = 0.26, A , 05 = 0.15, Aq.i = 0.19, A . 3 = 0.20, A . 6 = 0.21, putting 33% of the 
63 planets on orbits with e < 0.05. In principle, this method can be used with any eccentricity estimator, 
as long as the function P(e, ej n ) represents the cumulative distribution for the same estimator. The derived 
/o.05 varies in the range 21% — 45% depending on the estimator, with 33% being the median value. 

In reality, we expect a continuous underlying eccentricity distribution. Because it is impractical to 
calculate P(e, e in ) via Monte Carlo simulations for a large number of input eccentricities, our method is to 
adopt an analytical form for P(e, e- ln ) which agrees with the functions derived for the five values of e- ln in 
mock systems and may be generalized for other values of e in . We use e mean as our eccentricity estimator for 
this method, as this is the estimator closest to the one used in real radial velocity surveys. If all systems in 
the B06 catalog had the same eccentricity precision and if the obs erved values of e cos uj and esinw followed 
Gaussian distributions with dispersion S about their true values ( Shen fc Turner 2008h . then 



eexp(-^)7 (f^) 
e m) = -j V 7 . ,, 2N ; (12) 
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and P(e,e; n ) = J p(e', ei n )de'. As we show in Figure [TOI such distribution does not accurately reproduce 
P{&, ei n ) from our simulations. This occurs because the catalog combines systems with a range of eccentricity 
precisions. 

Much better fits to P(e, e- lB ) from simulations are obtained by assuming that e cos w and e sinw follow an 
exponential distribution in (— |e — ei n |) or a sum of two exponential distributions. For our single exponential 
model, we take 



P(e,e in ) 



r2ir , / -v/e 2 +e 2 — 2eei n cost, 

ej dui exp | — E7fin"o732l 



Io de ' e'f^duexp 



\J (e / ) 2 +e? n — 2e'ei n cqsuj 
E/| ln0.32| 



(13) 



where the factors of | ln0.32| are included so that S corresponds to the 68% confidence interval in h or k. 
We emphasize that E in this expression is not an eccentricity uncertainty for any specific data set, but a 
fitting parameter for the entire ensemble of systems. We choose S = 0.0302 by minimizing the sum of the 
KS statistics between each of the MCMC result distributions for the five e ln and its corresponding model 
p(e, ei n ). Similarly, for our double exponential model we take 



P(e,e in ) 



exp 



-y/ e 2 +e 2 n — 2eei n cos u) 
Si/| ln0.32 



, ; i \J e 2 +e? n — 2eei n cos a; 
ex P [ E 2 '/ n |ln0.32| 



Jo 1 de ' e ' lT duj 



exp 



2 +e. — 2e'ei n cos uj 



Si/| ln0.32| 



T , , -v/(e') 2 +e 2 — 2e'ei n cosu; 
BeX P ( S 2 /°ln 0.32| 



(14) 



with three parameters — the widths Si, £2 and the amplitude B — which vary linearly with ei n . The double 
exponential models are somewhat better fits than the single exponential models for all e ln except e- m = 
(Figure [TU]) . Models (fT5|) - (fr¥|) yield similar cumulative eccentricity distributions and /0.02, /0.05 values so 
below we discuss the single exponential model. 



We discretize the intrinsic eccentricity distribution as 

£> e8t (e) =Y^Ai5{t 



(15) 



and fit for Ai as in our first example of five input eccentricities. For the grid of used in -D es t we try 
eccentricity spacings 0.02, 0.03, 0.04, 0.05 and perform a bootstrap analysis on the inversion using each of 
the four e ln grids. The underlying eccentricity distributions inferred using the four grids are close to one 
another (Figure [IT]) and all are well above the observed eccentricity distribution for e < 0.1. The resulting 
fractions of planets on nearly circular orbits are /0.02 = 0.27±0.11 and /0.05 = 0.38±0.09, much higher than 
the B06 values of 0.06 and 0.17, respectively. We apply the same analysis to the 51 systems with P > 10c? and 
find underlying values /0.02 = 0.13 ± 0.05, /0.05 = 0.28 ± 0.08, much larger than the B06 values of /0.02 = 
and /0.05 = 0.10. We can likewise apply this analysis to the set of all 117 B06 single-planet systems if we 
assume that it is sufficiently statistically similar to the subset of the 63 planets in our simulations that the 
same P(e, e; n ) can be used in both cases. This yields underlying values /0.02 = 0.30±0.08, /0.05 = 0.32±0.06, 
whereas on the basis of the published eccentricities we would calculate /0.02 = 0.06 and /0.05 = 0.18. 

Another method to obtain the intrinsic eccentricity distribution is to directly solve for -Din(em) in 
equat i on (| 1 01) with functions p(e, e- ln ) from equation (|13|) using the iterative deconvolution procedure of 
Lucvl (|1974f ). Specifically, given the 0th guess for the intrinsic eccentricity distribution Df (e- ln ) (e.g., a flat 
distribution), the subsequent iterations are obtained using 



^fn(ein) 

N 



N 



p(ei,e in ) 



J D r ln {e m )p(e l ,e m )Ae m 

This method yields /0.05 = 24%, and the result of the deconvolution is illustrated in Figure IT2k. 



(16) 



Our model functions P(e, e- m ) from equation (|13[) deviate slightly from the results of the simula- 
tions. In particular, for e- m = 0.05 and e- m = 0.1, the model functions overpredict P(e,ei n ) by < 0.1 
(Figure I10( bottom) . To estimate the uncertainties introduced into f eo by using the model approxima- 
tion, we expand the integrand of equation (fTTj) around the model functions, £) actual = £)modci _|_ 
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and p actual — pmodci _|_ re ^ a j n ^he fi rs t order in corrections AD and AP and use the constraint 
C obs (e) = £)£ odel (e in )P model (e, e in )de in . This allows us to relate the correction AD to the known func- 
tions p modei > /jmodci anc j ^he latter can be estimated from Figure ITU]). This procedure suggests that 
/o.i derived using model functions is overestimated by about 0.03, well within the uncertainties of the de- 
convolution procedure. We note that the 5-functions method described in the beginning of this section is 
not affected by this systematic error and therefore provides an independent check. 
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0.18 


0.24 


0.49 
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^mcan 


0.13±0.05 
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Table 2: Estimates of the true underlying fraction of planets on nearly circular orbits using different methods 



6. Summary 

In this work, we constructed a catalog of mock radial velocity data for 2275 artificial single-planet 
systems with eccentricities 0, 0.05, 0.1, 0.3, and 0.6 and all other orbital parameters drawn from the real 
radial velocity data sets. We analyzed these data using MCMC simulations and compared the input and 
extracted orbital parameters in order to study potential biases introduced into the population of known 
radial velocity single-planet systems by the orbit extraction process. 

We paid particular attention to eccentricity biases because of its significance for testing planet formation 
models. Eccentricity is positive definite, and therefore a measurement bias is present, especially for planets on 
small-eccentricity orbits. The mode of the marginalized posterior one-dimensional eccentricity distributions 
output by MCMC simulations was the least biased of the eccentricity estimators we considered . The 
mode outperformed the mean e moan and median e mG( jian of the posterior eccentricity distribution as well as 
estimators based on mean or median values of h = e cos a; and k — e sinw. Since our e moan closely reproduces 
eccentricities derived using the standard ^-minimization methods, we suggest that the eccentricities derived 
and reported for planets with intrinsic eccentricities < 0.05 are typically biased high by la — 2a, while those 
for planets with intrinsic eccentricities > 0.1 are not significantly biased. We recommend e mo d e , the mode of 
the marginalized posterior distribution for eccentricity, as the preferred eccentricity estimator in observational 
studies of exoplanet population statistics. This requires minimal effort as many radial velocity exoplanet 
surveys already use MCMC analysis for orbital parameter estimation. 



The study most closely related to ours is that of IShen fc Turnerl (|2008f ). These authors isolate the 
dependence of the quality of individual derived orbit parameters on the number of observations and on the 
signal-to-noise ratio by varying these parameters separately in the radial-velocity data for one single-planet 
system and extracting orbits via \ 2 minimization. Their input orbit parameters differ somewhat from ours; 
in particular, they include many systems with much lower weighted signal-to-noise than ours and they do 
not consider the maximum phase gap or number of periods covered. Their findings that eccentricity bias 
preferentially affects low-eccentricity systems and that eccentricity errors decrease strongly with increasing 
weighted signal-to-noise are qualitatively consistent with ours. Our approach emphasizes the role of a 
realistic ensemble of planetary systems in shaping the relations between input data set quality and output 
orbit quality, particularly eccentricity bias. Since we draw our systems from a real survey, this allows us to 
estimate the underlying fraction of nearly circular exoplanet orbits. 

Using several methods, we estimate the true underlying fraction of planets on nearly circular orbits 
among B06 single-planet systems. We find that the fractions of planets with eccentricities below 0.02 and 



- 13 - 



below 0.05 are respectively /o,o2 = 0.27±0.11 and /0.05 = 0.38±0.09 — significantly higher than /0.02 = 0.06, 
/0.05 — 0-17 computed using the B06 published eccentricities. When we exclude systems with periods less 
than 10 days, we find values /0.02 = 0.13 ± 0.05, /0.05 = 0.28 ± 0.08, again significantly larger than the 
Jo. 02 = 0, Jo. 05 = 0-10 of the B06 published eccentricities. This suggests that low eccentricities like those 
seen among major planets in our solar system may not be as unusual among radial velocity exoplanets as has 
previously been believed. In particular, the eccentricity distribution (corrected for biases) is not well matched 
to the eccentricity distribution of dyna mically active systems t hat went through a phase of planet-planet 
scattering, dN oc eexp(— ^ (e/0.3) 2 )de |juric fc Tremainei l2008h . which would result in almost no planets 



on nearly circular orbits. In Figure 112b . we represent the instrinsic (de-biased) eccentricity distribution as 
a linear combination of a population of pl anets on circular orbits (38% of systems) and a population of 
dynamically active systems described by the Juric fe Tremainei ( 2008h distribution (62%). Therefore, a large 



fraction of all planets may have avoided a phase of planet-planet scattering, for example because they were 
formed in systems with few very massive planets. Alternatively, these systems may have been dynamically 
active, but the eccentricities may have been d amped by some mechanism (e.g., by the residual disk material, 
Raymond et al. 2009t Matsumura et al. 2010|) . 



While we found no overall bias among periods and velocity amplitudes extracted from our mock planet 
catalog, the errors we derived for our periods, velocity amplitudes, and eccentricities suggest rough guidelines 
for the quality of extracted orbit parameters one might reasonably expect from real radial velocity data sets. 
Specifically, an eccentricity precision < 0.05 typically requires weighted signal-to-noise > 40; a normalized 
period precision of 1% is achieved in only two-three orbital periods; and a velocity amplitude precision 
of 3 m/s typically requires a maximum phase gap < 0.3 period for low- to moderate-eccentricity systems 
and < 0.15 period for moderate- to high-eccentricity systems. These guidelines correspond to the median 
relationships seen in our data. 

Limitations of our study include our restricting our mock catalog to single-planet systems with fewer 
than 90 observations. Thus, our results exclude the best-studi ed systems. Histor ically, there is a tendency 
for reported eccentricities to decrease with more observations ( Butler et al. 20061 and £!5.1j) , sometimes due 



to the discovery of additional planets. Nevertheless, we expect that our main findings can be generalized to 
other radial velocity surveys. Due to competition for observing time, few systems are followed up simply 
to achieve higher precision in parameter determination. Thus eccentricity bias remains significant at low 
eccentricity, and simply increasing the sample size of known planets is insufficient to ensure that the observed 
eccentricity distribution approaches the underlying one. 

Our results also exclude the ~30% of radial-velocity planets found in m ultiple-planet systems. Such 
planets have an eccentricity distribution similar to that of single-planet systems (|Wrighdl2009h . so eccentricity 
bias of the kind described in this work is likely an important consideration in the population statistics of 
multiple planet systems as well. 



N.L.Z. was supported by the Spitzer Space Telescope Fellowship provided by NASA through a con- 
tract issued by the Jet Propulsion Laboratory, California Institute of Technology; by the John N. Bahcall 
Fellowship at the IAS; and by the NSF grant AST-0807444. M.P. was supported by AMIAS and Taplin 
memberships at the IAS. E.B.F. was supported by NASA Origins of Solar Systems grant NNX09AB35G 
and the University of Florida. M.P. and E.B.F. were supported in part by the NSF grant PHY05-51164, by 
Kavli Institute for Theoretical Physics at University of California, Santa Barbara, and by the Aspen Center 
for Physics. The authors would like to thank Mario Juric and Scott Tremaine for discussions and the referee 
for the comments on the manuscript. 



REFERENCES 

Adams, F.C. & Laughlin, G. 2003, Icarus, 163, 290 
Balan, S.T. & Lahav, O. 2009, MNRAS, 394, 1936 



-14- 



Batygin, K., Laughlin, G., Meschiari, S., Rivera, E., Vogt, S., & Butler, P. 2009, ApJ, 699, 23 
Butler ct al. 2006, AJ, 646, 505 (B06) 

Chatterjee, S., Ford, E.B., Matsumura, S., & Rasio, F.A. 2008, ApJ, 686, 580 
Cumming, A. 2004, MNRAS, 354, 1165 

Endl, M., Cochran, W.D., Wittenmyer, R.A., & Hatzes, A.P. 2006, AJ, 131, 3131 
Fabrycky, D. & Tremaine, S. 1007, ApJ, 669, 1298 
Ford, E.B. 2005, AJ, 129, 1706 
Ford, E.B. 2006, ApJ, 642, 505 
Ford, E.B., 2008, ApJ, 135, 1008 

Ford, E.B. & Gregory, P.C. 2007, ASP Conf. Scr., 371, 189 

Ford, E.B. & Rasio, F.A. 2006, ApJ, 638, L45 

Ford, E.B. & Rasio, F.A. 2008, ApJ, 686, 621 

Ford, E.B., Havlickova, M., & Rasio, F.A. 2001, Icarus, 150, 303 

Gregory, PC. 2005, ApJ, 631, 1198 

Gregory, PC. 2007, MNRAS, 374, 1321 

Holman, M., Touma, J., & Tremaine, S. 1997, Nature, 386, 254 

Juric, M. & Tremaine, S. 2008, ApJ, 686, 603 

Laughlin, G. & Adams, F.C. 1998, ApJ, 508, L171 

Lucy, L.B. 1974, AJ, 79, 745 

Lucy, L.B. & Sweeney, M.A. 1971, AJ, 76, 544 

Malmberg, D. & Davics, M.B. 2009, MNRAS, 394, L26 

Malmbcrg, D., de Angeli, F., Davies, M.B., Church, R.P., Mackey, D., & Wilkinson, M.I. 2007, MNRAS, 
378, 1207 

Marzari, F. & Weidenschilling, S.J. 2002, Icarus, 156, 570 

Marzari, F., Weidenschilling, S.J., Barbieri, M., & Granata, V. 2005, ApJ, 618, 502 

Matsumura, S., Takeda, G., & Rasio, F.A. 2008, ApJ, 686, L29 

Matsumura, S., Thommes, E.W., Chatterjee, S., Rasio, F.A. 2010, ApJ, 714, 194 

Murray, CD. & Dermott, S.F. 1999, Solar System Dynamics (Cambridge University Press, Cambridge, New 
York) 

Nagasawa, M., Ida, S., & Bessho, T. 2008, ApJ, 498 

O'Toolc, S.J., Tinney, C.G., Jones, H.R.A., Butler, R.P., Marcy, G.W., Carter, B., & Bailey, J. 2009, 
MNRAS, 392, 641 

Rasio, F.A. & Ford, E.B. 1996, Science, 274, 954 

Raymond, S.N., Armitage, P.J., Gorclick, N. 2009, ApJ, 699, L88 

Shen, Y. & Turner, E.L. 2008, ApJ, 685, 553 



- 15 - 



Tremaine, S. k Zakamska, N.L., 2004, AIP Conf. Proc, 713, 243 
Takeda, G. k Rasio, F.A. 1005, ApJ, 627, 1001 

Vogt, S.S., Butler, R.P., Marcy, G.W., Fischer, D.A., Henry, G.W., Laughlin, G., Wright, J.T., k Johnson, 
J.A. 2005, ApJ, 632, 638 

Weidenschilling, S.J. k Marzari, F., 1996, Nature, 384, 619 

Wittenmyer, R.A., Endl, M., k Cochran, W.D. 2007, ApJ, 654, 625 

Wright, J.T. 2005, PASP, 117, 657 

Wright, J.T. 2009. larXiv:0909.0957l (conf. proc, submitted) 
Wright, J.T., k Howard, A.W. 2009, ApJS, 182, 205 
Zakamska, N.L. k Tremaine, S. 2004, AJ, 128, 869 



This preprint was prepared with the AAS IATjrjX macros v5.2. 



- 16 - 




period (days) published eccentricity 



Fig. 1. — Left: periods and eccentricities of all known radial velocity planets (grey) and 63 systems from 
B06 planets for which a single planet provides a good fit, as described in (black). Right: comparison 
of eccentricity distributions. Values of periods and eccentricities of 385 radial velocity planets were taken 
from exoplanet.eu as of end April 2010 (the list includes short-period planets discovered using the transit 
method but which have radial velocity observations). The two eccentricity distributions are consistent with 
each other in the sense of the Kolmogorov-Smirnov test. The fraction of planets with published eccentricities 
< 0.05 is /o.o5 = 17 — 30% (the two values are for B06 and for exoplanet.eu planets, respectively), and the 
fraction of those with e < 0.1 is /o.i = 38 — 41%. Excluding planets with P < 4 days, /0.05 = 9 — 17%. 




period, days velocity amp., rn/s 



Fig. 2. — Example marginalized posterior parameter distributions obtained from our MCMC simulations for 
a typical mock radial velocity data sets. The input eccentricity was and the other orbit parameters were 
from HD213240b. The left panels shows the marginalized period distribution and the right panel shows the 
velocity amplitude distribution. The vertical line shows the input value. The dashed line shows a Gaussian 
distribution with the dispersion determined from the 68% credible interval. 
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1D eccentricity e k = e-cos(u) 8=(h-h m +k-kJ/V(h£H^J 



Fig. 3. — The left panel shows the marginalized posterior eccentricity distribution for the same simulated 
radial velocity data set as in Figure [5J The input eccentricity was 0. Since eccentricity is positive definite, 
all eccentricities within the credible interval are positive. The middle panel shows projection of the Markov 
chain onto the h — k plane. The grey cross marks the input value (0,0) and the grey ellipses were computed 
using principal component analysis to approximate two-dimensional 68% and 95% credible contours. The 
right panel shows the distribution of the value e derived from h and k (/i m and A: m are the median h and k). 
Although the best-fit e is positive definite, in general the values e can be positive or negative, so the credible 
intervals may be sensibly defined. The horizontal line with points shows the median of the distribution, as 
well as the 68% and the 95% credible intervals. 
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Fig. 4. — Comparison of output eccentricity with input eccentricity for six eccentricity measures. Each 
panel shows histograms of the eccentricity bias (= e ou t — em) normalized to a[e] together with a Gaussian 
distribution (smooth black line); the left, middle, and right columns show data for e m = 0, 0.05, 0.1 respec- 
tively. Top panels show data for eccentricity measures derived directly from the posterior distribution for 
eccentricity marginalized over all other parameters: the blue, red and black histograms correspond to the 
mode (e ou t = e mo dc), mean (e out = e moan ), and median (e ou t = e mc d) of that distribution. While the mean 
and median are quite biased for input eccentricities < 0.05, the bias vanishes at higher eccentricities. Bottom 
panels show the bias for alternative eccentricity measures defined in the h — k plane, with red histograms for 
\/ '•mean + ^mcan an d black for i/^mod + ^mccT Dotted histograms are for the values weighted by eccentricity, 
an estimate of the results we would have obtained using priors flat in h, k. 
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Fig. 5. — Scatter plots illustrating correlations between selected data set quality and output quality metrics: 
a) weighted signal-to-noise vs. eccentricity precision; b) number periods covered vs. normalized period 
precision; c) maximum gap in phase coverage vs. velocity amplitude precision; d) weighted signal-to-noise 
vs. eccentricity bias (the nine points corresponding to eccentricity biases of absolute value between 10~ 6 
and 10 -4 are not shown); e) number of observations vs. eccentricity precision; f) number of observations vs. 
normalized period precision. The wavelength of the color code increases with input eccentricity (purple for 
e- m = 0, blue for e- m = 0.05, green for e- ul — 0.1, orange for e- ln = 0.3, red for e; n = 0.6). The pairs of input 
and output metrics in panels a), b), c) are those we found to be most strongly correlated. Guidelines for 
expected output quality from an RV data set of given input quality based on these strong correlations are 
discussed in $6] Panel d) clearly shows positive bias for small values of input eccentricity e- ln = 0, 0.05, 0.1. 
Bias and uncertainty in eccentricity decline strongly with S/N cff (panels a, d), but are not correlated with 
N (panels e, f). 
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Fig. 6. — Precision of different orbit parameters as a function of the size of the data set, for two values of 
input eccentricity (triangles for e- ln — and circles for e ln = 0.6) for the same system. Mock data sets are 
obtained by randomly throwing away some of the points from the complete set. Each number of points is 
sampled 6 times, and error bars correspond to the variance among these 6 realizations. The dotted line 
shows the best iV -1 / 2 fit to the precisions for e- m — 0. For e m — 0.6 precisions, best-fit power-laws have 
slopes ranging from -1.5 to -1. 
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Fig. 7. — Comparison between our eccentricity measurements and those from the B06 catalog for systems 
with a good one-planet fit. The top panels show the performance of the mean values of the posterior 
distribution e mcan and of their precisions. There is no systematic difference between these values and those 
from the B06 catalog (top middle; shaded grey for all eccentricities, dashed outline for those < 0.1, and solid 
outline for those > 0.1), so this is the measure we consider to be the closest to the published values. The 
bottom panels show the comparison between the published values and our preferred estimator e mo d c - This 
estimator returns values of eccentricity which are typically 0.25<r lower than those in the published catalog 
(middle panel, shaded grey histogram). When we consider only planets with e mo dc < 0.1 (dashed outline) 
this difference increases to 0.6a, but for planets with e mo d c > 0.1 alone this difference becomes negligible. 
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Fig. 8. — The comparison between the discovery ('old') orbital eccentricites and their uncertainties and the 
most recent ('new') ones for the 34 systems from http:/ /exoplanets.org for which six or more years separate 
the two epochs. The dotted lines show a one-to-one correspondence to guide the eye. As time went on, 
the effective signal-to-noise of observations increased, leading to decrease in eccentricity uncertainties and 
therefore in eccentricities themselves. 
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Fig. 9. — Left panel: cumulative distributions P(e, e- m ) (eq. determined from our simulations for all five 
input values e in = 0,0.05,0.1,0.3,0.6 (increasing from left to right). The output measure of eccentricity is 
emode throughout this figure. Right panel: the cumulative distribution of eccentricities in the real extrasolar 
planetary systems (black) and its best approximation as a linear combination of the five functions P(e, ei n ) 
from our simulations (grey). The coefficients of this linear fit allow us to determine the intrinsic fraction of 
planets on nearly circular orbits: /0.05 = 0.33. 




Fig. 10. — Comparison between P(e, e; n ) derived from our MCMC simulations, with e mean as the eccentricity 
measure, and the analytic approximations from i )5.3l (the points and curves are, from left to right, for e- ln = 0, 
ejn = 0.05, ein = 0.1, e- m = 0.3 and ei n = 0.6, or purple, blue, green, orange and red, correspondingly, in the 
on-line version of the paper). The top panel shows Gaussian models (eq. I12j) . For each P(e, ei n ), we try two 
different values of £. For small eccentricities and data sets with many observations evenly distributed over 
orbit phase, we expect £ ~ median(v / 2o'obs/-^'V^V), shown with solid lines. We also try £ = median(cr[e]), 
shown with dashed lines. While the former produces nominally better fits than the latter, both sets poorly 
describe the tails of the MCMC result distributions, especially for e ln = and e; n = 0.05. The bottom panel 
shows single exponential models with the fitted £ = 0.0302 (dashed lines) and double exponential models 
with £1 = 0.0305, £ 2 = 0.00874 - 0.00719e in , B = 3.46 + 20.9e in (solid lines). Exponential models are 
significantly better than the Gaussian for reproducing the tails of the MCMC-derived distributions. The 
double exponential model is better than the single exponential one for all ej n values except e ln = 0. 
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Fig. 11. — Estimates of the underlying cumulative eccentricity distribution for 63 B06 systems (top plot) 
and for the 51-system subset of those with period longer than 10 days (bottom plot). In both plots the 
inset panel provides a larger view of the lower left-hand corner of the main plot. Results of a bootstrap 
analysis using a single exponential model for p(e, ei n ) with e- m grid spacings 0.02, 0.03, 0.04, 0.05 in the top 
plot and 0.03, 0.04, 0.05 in the bottom plot appear as dashed lines; longer dashes correspond to larger grid 
spacings. The white line shows the average of the results obtained using the different grid spacings, and the 
shading gives the la uncertainty obtained by averaging the bootstrap standard deviations obtained using 
the four grid spacings. As the top plot indicates, the implied underlying fractions of single-planet systems 
with eccentricities below 0.02, 0.05 — respectively /0.02 = 0.27 ± 0.12, / .05 = 0.38 ± 0.09 — are larger than 
the /0.02 = 0.06, /0.05 = 0.17 of the published eccentricities (orange) by 1.9a and 2.3ct. For the systems with 
P > lOd the bottom plot indicates underlying values /0.02 = 0.13 ±0.05, /0.05 = 0.28 ±0.08, which are larger 
than the /0.02 = 0, /0.05 = 0.10 by 2.6 and 2.3 standard deviations, respectively. We include vertical lines at 
e = 0.02, 0.05 to guide the eye. 
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Fig. 12. — Left: Observed cumulative eccentricity distributions in b lack and the Lucy-deconvolved eccen- 
tricity distribution in grey. The eccentricity estimator is e moa n, and iLucvl (|1974h deconvolution is applied 
as in equation ()16[) with functions p given by equation (|13|) . Right: The deconvolved distribution (grey) is 
represented as a sum of two populations: a population of planets on circular orbits and another population 
of planets with an eccentricity distribution dN oc eexp(— i(e/0.3) 2 )de. The solid black line is the best 
linear fit, with 38% of all systems attributed to the former component, meant to represent systems which 
are dynamically inactive or have been circularized, and 62% to the latter, representing dynamically active 
systems resulting from planet-planet scattering. 



